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DETAILED ACTION 

Information Disclosure Statement 

1 . The references listed in the Information Disclosure Statement submitted on 07/1 7/2003 
have been considered by the examiner (see attached PTO-1449). 

Specification 

2. The disclosure is objected to because of the following informalities: 

a. On page 21, paragraph 55, the variables used for the mapping are not consistent 
without any explanation. For example, in one piece, it uses Vj, Cj and aj and the other two 
pieces it uses v, o and a. Further, the recited "piece-wise linear function" appears to be 
incorrect because the function has a variable a with square operation, which is non- 
linear. Appropriate correction and/or clarification are required. 

Claim Rejections - 35 USC § 101 
35 U.S.C. 101 reads as follows: 

Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or 
any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and 
requirements of this title. 

3. Claims 1-23 are rejected under 35 U.S.C. 101 because the claimed invention is directed 
to non-statutory subject matter. 

Regarding claim 1, it claims "a method" in preamble, which appears, in the surface, to 
fall within statutory classes (i.e. a process). However, based on the claimed language, the terms 
"signal", "parameters" and "components" can be interpreted as pure data in a broad sense, and 
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the claim, as whole, is substantially drawn to or reasonably interpreted as manipulating pure 
(abstract) data or algorithm, which falls within 35 USC 101 Judicial Exceptions, i.e. abstract 
idea. Further, since the claim, as whole, only involves or manipulates pure (abstract) data or 
algorithm and the results is in abstract nature, it lacks to produce a useful, tangible, and concrete 
result in a practical application. Therefore, the claim, as whole, is directed to non-statutory 

■ 

subject matter. 

Regarding claims 2-23, the rejection is based on the same reason described for claim 1, 
because these dependent claims include the same or similar problematic limitations as claim 1. 

4. To expedite a complete examination of the instant application the claims rejection under 
35 U.S.C 101 (nonstatutory) above are further rejected as set forth below in anticipation of 
applicant amending these claims to place them within the four statutory categories of invention. 

I. 

Claim Rejections - 35 USC § 112 
The following is a quotation of the second paragraph of 35 U.S.C. 1 12: 

The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject 
matter which the applicant regards as his invention. 

5. Claim 10 and 22-26 are rejected under 35 U.S.C. 1 12, second paragraph, as being 
indefinite for failing to particularly point out and distinctly claim the subject matter which 
applicant regards as the invention. 

Regarding claim 1 0, the limitation "using a value computed when the components of a 
previous firame were processed to determine which of the parameters characterizing the 
respective distribution to update" is confused or unclear. The limitation appears to say using a 
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value based on previous processed frame to determine the current frame parameters for 
updating, which does not make sense to the examiner, since the distribution of current frame may 
not be the same type as that of previous frame. Therefore, the Umitation is indefinite. 

Regarding claim 22, it recites the limitation "computing at least an approximation to an 
expected value of the composite Gaussian and signal distribution using the value of the 
component, and the parameters, to obtain a signal-enhanced component. . There is insufficient 
antecedent basis for this limitation in the claim. 

Regarding claim 23, the rejection is based on the same reason described for claim 22, 
because the dependent claim includes the same or similar problematic limitation(s) as claim 22. 
In addition, the limitation "piece-wise linear function approximation of the expected value" is 
also indefinite because the related disclosure shows the function is non-linear (see closest 
disclosure in p55 of the specification, wherein the function includes a variable a with square 
operation -non-linear function), which conflicts with the claimed limitation. Appropriate 
correction and/or clarification is required. 

Regarding claim 24, it recites the limitation "the value of the component to obtain ..." 
There is insufficient antecedent basis for this limitation in the claim. 

Regarding claims 25-26, the rejection is based on the same reason describ.ed for claim 24, 
because the dependent claims include the same or similar problematic limitation(s) as claim 24. 
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Claim Rejections - 35 USC § 103 
The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in 
section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are such 
that the subject matter as a whole would have been obvious at the time the invention was made to a person having 
ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the manner in 

which the invention was made. 

6. Claims 1-4, 6, 10-11, 16,-18, 22, and 24-25 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over ERTEN (US 2002/01 16187 Al): 

As per claim 1, as best understood in view of the rejection under 35 USC 101 (see 
above), ERTEN discloses 'speech detection' (title), comprising: 

"decomposing a frame of the noise-contaminated signal received in a predefined time 
period into decorrelated signal components" (Fig, 8, and paragraph (hereinafter referenced as 
p)106, 'time window (predefined time period)'; pl07, -frequency converter 158 generates 
(decomposes) speech frequency bands ...from windowed speech signal (frame)152', 'implement 
a fast Fourier transform (FFT) algorithm', wherein Fourier transform inherently decomposes the 
windowed signal (frame) into uncorrelated signal components; Fig. 5, shows separation of 
speech 60 and noise 30, which can also be read on the claim); 

"recursively updating respective parameters characterizing a Gaussian noise distribution 
and a signal distribution of each of the respective components as a fiinction of time", (p42, 
'parameter matrices' and 'continuous-time dynamics or discrete-time state' (function of time); 
p49, 'mixing environment can be modeled as the following nonlinear discrete-time dynamic 
processing model (function of time)'; p53, 'the update law for dynamic environments 
(corresponding to recursively updating) is used to recover the original signals' and 
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'environment 42 is modeled as linear dynamical system'; pi 10, 'voice signals tend to have 
Laplacian probability distribution' and 'noise signals... tend to have a Gaussian or Super- 
Gaussian probability distribution'; pl03, 'properties (also corresponding to parameters) can 
convey any information' including 'power, statistical properties, spectral properties, envelop 
properties, proximity...); 

"using the respective parameters to evaluate a [composite Gaussian and signal] 
distribution function to provide a measure of noise and signal contributions to the component" 
(pi 10, 'the variance (measure). . .may be used to determine (evaluate) the presence of voice 
(corresponding to signal contributions)' and 'various other statistical measures, such as kurtosis, 
standard deviation ...may be extracted as properties of speech and noise signals or frequency 
.bands (components)'; Figs. 2-5, 'mixed environment' (corresponding to a distribution function)). 

But, ERTEN does not expressly disclose the distribution function being "composite 
, Gaussian and signal distribution function". However, as stated above, ERTEN teaches that 
'voice signals tend to have Laplacian probability distribution' and 'noise signals. . .tend to have a 
Gaussian or Super-Gaussian probability distribution' (pi 10), and processing the mixed signal in 
'mixed environment' (Figs. 2-5). Therefore, it would have been obvious to one of ordinary skill 
in the art at the time the invention was made to recognize that the mixed signal would have a 
mixed (joint or composite) distribution that corresponds to the mixed environment, and to 
combine the teachings of ERTEN by providing a mixed (joint or composite) distribution that 
reflects the mixed signals with properties of Laplacian probability distribution (for speech) and 
Gaussian probability distribution (for noise) in the mixed environment, because either of speech 
and noise has its own probability distribution as suggested by ERTEN and the mixed signal is 
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necessarily associated with a mixed (joint or composite) distribution to reflect properties of the 
mixed signal and noise distribution in the mixed environment. 

As per claim 2 (depending on claim 1), the rejection is based on the same reason 
described for claim 1, because the rejection for claim 1 covers the same or similar limitation(s) 

« 

« 

as claim 2. 

As per claim 3 (depending on claim 1), the rejection is based on the same reason 
described for claim 1, because the rejection for claim 1 covers the same or similar limitation(s) 
as claim 3, wherein 'time window' and 'windowed speech signals' inherently include the 
claimed "a predefined number of samples" and FFT also inherently includes the claimed 
"applying a matrix transform". 

As per claim 4 (depending on claim 1), the rejection is based on the same reason 
described for claim 1, because the rejection for claim 1 covers the same or similar limitation(s) 
as claim 4, wherein 'FFT' inherently includes the claimed "mapping. . .from a time domain to a 
frequency domain". 

As per claim 6 (depending on claim 1), the rejection is based on the same reason 
described for claim 1, because the rejection for claim 1 covers the same or similar limitation(s) 
as claim 3, wherein 'Fourier transform' inherently includes the sinusoidal functions as basis 
functions as claimed. 

As per claim 10 (depending on claim 2), as best understood in view of the rejection under 
35 use 112 2"^ (see above), ERTEN discloses "a value computed when the components of a 
previous frame were processed to determine which of the parameters characterizing the 
respective distribution to update" (Fig. 7 and pl04-105, 'detection parameter... may be 
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scaled. . .or . . .a binary value', which is used to 'attenuates (update) extracted speech signal'; also 
see Fig. 8 and pi 08). 

As per claim 11 (depending on claim 10), ERTEN does not expressly disclose "wherein 
the previously computed value is an a priori probability of the frame constituting noise, and 
using the a priori probability to determine which of the parameters to update comprises: selecting 
a measure of variance that characterizes the Gaussian noise distribution if the a priori probability 
is below a predetermined threshold; and otherwise selecting a measure of variance factor that 
characterizes the Laplacian distribution." However, ERTEN teaches that using 'probability 
density of the Jth component (interpreted as a priori probability of components, including noise)' 
(p47); 'speech likelihood signal may be a binary signal or may expressed some probability that 
speech has been detected' (pi 14); 'a binary value resulting from comparing the operation results 
to one or more threshold values' (pi 04); 'voice signals tend to have Laplacian probability 
distribution. . .noise signals. . .tend to have a Gaussian or Super-Gaussian probability 
distribution... thus voice signals can be said to be of low variance', 'the variance of extracted 
speech signal or speech frequency bands may be used to determine the presence of voice' and 
'various other statistical measures... my be extracted as properties of speech and noise signal or 
frequency bands' (pi 10). Therefore, it would have been obvious to one of ordinary skill in the 
art at the time the invention was made to recognize that the likelihood signal expressed by 
probability can be an a priori probability and is associated with Laplacian (for speech) and/or 
Gaussian (for noise) probability distribution using the corresponding variance, and to combine 
the teachings of ERTEN by providing (a priori) probability and the associated Laplacian (for 
speech) or Gaussian (for noise) probability distributions using variance, as suggested by ERTEN, 
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for the purpose (motivation) of using various statistical measures for extracting properties of 
speech and noise and/or produce separated speech and noise signals from mixed a signal 
(ERTEN:pllOand p39). 

As per claim 16 (depending on claim 1 1), as state above, ERTEN discloses "computing a 
measure of fit of the components to a composite Gaussian and Laplacian distribution" (as 
describe for claim 1; also see ERTEN: pi 03 and pi 10). 

As per claim 17 (depending on claim 16), ERTEN further discloses "computing a 
measure of fit of each of the received components to a respective Gaussian noise distribution 
defined using the respective parameters; and comparing a mean of the measures of fit to the 
respective Gaussian noise distributions with a mean of the measures of fit to the composite 
Gaussian and Laplacian distributions, to compute a likelihood that the components of the frame 
constitute noise or noise-contaminated voice signal", (ERTEN: 103, 'properties (measures or 
parameters)... may include... statistical properties (necessarily including mean value), 
...averages (broadly interpreted as mean values)... model fitting values (including measure of 
fit)'; pi 10, 'various other statistical measures'; Fig 5 and p90, 'generates (comparing 
result)... the difference between sound signal (corresponding to the composite Gaussian and 
Laplacian distributions) from microphone m2 and filtered noise signal (corresponding to 
Gaussian noise distributions)'; pi 13-pl 14, 'speech detected signal has such noise periods 
attenuated' (detecting noise) and 'speech likelihood signal may be a binary signal (implying 

w 

either speech with noise or background noise only); which corresponds to the claim). 
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As per claim 18 (depending on claim 17), ERTEN discloses "evaluating the distribution 
at the value of the component received" (with same reason described above; also see ERTEN: 
pi 10). 

As per claim 22 (depending on claim 1), as best understood in view of the rejection under 

35 use 112 2 (see above), ERTEN does not expressly disclose "computing at least an 
approximation to an expected value of the composite Gaussian and signal distribution using the 
• value of the component, and the parameters, to obtain a signal-enhanced component, if it is 
determined that the frame is signal active". However, ERTEN teaches generating 'one or more 
noise signal properties' including 'statistical properties. . .average (approximation to an expected 
value)... model fit values (can also includes approximation to an expected value)' (pl03), using 
'Gaussian' and 'Laplacian probability distributions' with 'various statistical measures (including 
approximation to an expected value, such as the corresponding estimated sample value)' to 
'determine the presence of voice' (pi 10); 'speech likelihood signal' and 'speech detector' 
(pi 14); and extracting 'noise signal' and producing 'detected speech signal (obtain a signal- 
enhanced component' (Fig. 5 and p90; and Fig. 7 and 105). Therefore, it would have been 
obvious to one of ordinary skill in the art at the time the invention was made to recognize that a 
temporal (or ergodic) value of a test samples can be used as an approximation of statistical 
expected (ensemble) value, such as a time average can be an approximation of a mean (statistical 
expected) value, and to combine the different teachings of ERTEN by providing an 
approximation to an expected value with Laplacian and Gaussian (for noise) probability 
distributions, such as time average, suggested by ERTEN, for the purpose (motivation) of 



Application/Control Number: 10/620,453 Page 1 1 

Art Unit: 2626 

extracting properties of speech and noise and/or producing separated speech and noise signals 
from mixed a signal (ERTEN: pi 10 and p39). 

As per claim 24, it recites an apparatus. As best understood in view of the rejection 
under 35 USC 112 2"^ (see above), the rejection is based on the same reason described for claims 
1 and 22, because the rejection for claims 1 and 22 covers the same or similar limitation(s) as 
claim 24 (wherein * speech likelihood signal' and 'speech detector' (pi 14) is read on "voice 
activity detector" with the associated functionality as claimed), accept the limitation "an inverse 
signal transform for re-composing the frame of samples". However, this feature is further 
disclosed by ERTEN (p40, 'transform function inversion'; Fig. 8 and pi 09, 'combiner 170 
performs... by an inverse-FFT to generate detected speech signal 34'). 

As per claim 25 (depending on claim 24), ERTEN discloses "the clean speech estimator 
computes an expected value of each of the composite Gaussian and Laplacian distributions to 

« 

independently derive a speech-enhanced component corresponding to each of the components" 
(pi 10, 'the variance (expected value) of extracted speech signal 28 or speech frequency bands 

ft 

(components) may be used to determine (evaluate) the presence of voice (corresponding to signal 
contributions)' and 'various other statistical measures, such as kurtosis, standard deviation (also 
expected values) . . .may be extracted as properties of speech and noise signals or frequency 
bands (components)'; Fig. 8 and pl08-pl09, 'any property of speech frequency band o noise 
frequency band may be used' including 'statistical properties'; 'combiner 170 combines 
frequency band output (a speech-enhanced component) 168 for each speech frequency band 160 
to generate detected speech signal'). 
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7. Claims 5, 7-9 and 26 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
ERTEN in view of admitted prior art disclosure, hereinafter referenced as ADMISSION. 

As per claim 5 (depending on claim 4), ERTEN does not expressly disclose "mapping 

■ 

the frame comprises applying a discrete cosine transform to the frame of samples". However, 
the feature is well known in the art as evidenced by ADMISSION who teaches that 'there are 
many known transforms for decomposing (mapping) a frame of samples' and 'the most common 
of these include the frequency-domain transforms such as the Fourier transform, and the discrete 
cosine transform (DCT), wavelet decomposition transforms such as the standard wavelet 
transform (SWT), and adaptive transforms like the Karhunen-Loeve Transform' (p5-p6 in the 
section of "Background of the Invention" of the specification). Therefore, it would have been 
obvious to one of ordinary skill in the art at the time the invention was made to modify ERTEN 
by providing a transform using DCT for the decomposition, as taught by ADMISSION, for the 
purpose (motivation) of providing low complexity decomposition technique (ADMISSION: p6). 

As per claim 7 (depending on claim 6), ERTEN does not expressly disclose decomposing 
the frame into "wavelets". However, the feature is well known in the art as evidenced by 
ADMISSION who teaches that 'there are many known transforms for decomposing (mapping) a 
frame of samples' and 'the most common of these include the frequency-domain transforms such 
as the Fourier transform, and the discrete cosine transform (DCT), wavelet decomposition 
transforms such as the standard wavelet transform (SWT), and adaptive transforms like the 
Karhunen-Loeve Transform' (see p5 and p7 in the section of "Background of the Invention" of 
the specification). Therefore, it would have been obvious to one of ordinary skill in the art at the 
time the invention was made to modify ERTEN by providing a transform using DCT for the 
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decomposition, as taught by ADMISSION, for the purpose (motivation) of better representing 
discontinuities for the signal (ADMISSION: p7). 

As per claims 8-9 (depending on claim 6), ERTEN does not expressly disclose 
"recomputing the basis functions to adaptively optimize decomposition" and "applying an 
adaptive Karhunen-Loeve transform". However, the feature is well known in the art as evidenced 
by ADMISSION who teaches that 'there are many known transforms for decomposing 
(mapping) a frame of samples' and 'the most common of these include the frequency-domain 
transforms such as the Fourier transform, and the discrete cosine transform (DCT), wavelet 
decomposition transforms such as the standard wavelet transform (SWT), and adaptive 
transforms like the Karhunen-Loeve Transform' (p5 and p7 in the section of "Background of the 
Invention" of the specification). Therefore, it would have been obvious to one of ordinary skill 
in the art at the time the invention was made to modify ERTEN by providing a transform using 
DCT for the decomposition, as taught by ADMISSION, for the purpose (motivation) of 
maximizing the capacity of the basis functions to present the signal (ADMISSION: p7). 

As per claim 26 (depending on claim 25), the rejection is based on the same reason 
described for claim 5, because the claim recites the same or similar limitation(s) as claim 5. 

8. Claims 12-13 are rejected under 35 U.S.C. 103(a) as being unpatentable over ERTEN in 
view of VALVE et al. (US 6,707,910 bl), hereinafter referenced as VALVE. 

As per claim 12 (depending on claim 1 1), ERTEN does not expressly disclose "the a 
priori probability is defined by evaluating a hidden state of a hidden Markov model". However, 
the feature is well known in the art as evidenced by VALVE who discloses 'detection of the 
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speech activity of a source '(title), comprising using 'HMMs (hidden Markov models — statistical 
models)' having 'probability density function (pdf: corresponding to a priori probability)' for 
* speech activity detection' (col. 9, lines 22-49). Therefore, it would have been obvious to one of 
ordinary skill in the art at the time the invention was made to modify ERTEN by providing 
HMMs with pdfs for speech activity detection, as taught by VALVE, for the purpose 
(motivation) of improving speech activity detection by utilizing statistical information (VALVE: 
col. 9, lines 9-13). 

As per claim 13 (depending on claim 12), ERTEN in view of VALVE discloses 
"incrementally changing the parameter in accordance with a difference between an expected 
value of the component given the past value of the parameter, and the value of the component 
received" (ERTEN: p53, 'the update law for (dynamic incrementally changing) environments is 
used to recover the original signals' and 'environment 42 is modeled as linear dynamical 
system'; pi 10, 'statistical measures (parameters)', such as 'variance', 'kurtosis' and 'standard' 
can be interpreted as expected value; wherein HMMs inherently include determining 
difference(s) (state changing) between current parameter(s) and the past value of the 
parameter(s)5 as claimed). 



Conclusion 

9. Please address mail to be delivered by the United States Postal Service (USPS) as 
follows: 

Mail Stop 

Commissioner for Patents 
P.O. Box 1450 
Alexandria, VA 22313-1450 
or faxed to: 571-273-8300, (for formal communications intended for entry) 
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Or: 571-273-8300, (for informal or draft communications, and please label 
"PROPOSED" or "DRAFT") 
If no Mail Stop is indicated below, the line beginning Mail Stop should be omitted from the 
address. 

Effective January 14, 2005, except correspondence for Maintenance Fee payments, 
Deposit Account Replenishments (see L25(c)(4)), and Licensing and Review (see 37 CFR 5.1(c) 
and 5.2(c)), please address correspondence to be delivered by other delivery services (Federal 
Express (Fed Ex), UPS, DHL, Laser, Action, Purolater, etc.) as follows: 

U.S. Patent and Trademark Office 

Customer Window, Mail Stop 

Randolph Building 

Alexandria , VA 22314 
Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Qi Han whose telephone numbers is (571) 272-7604. The 
examiner can normally be reached on Monday through Thursday from 9:00 a.m. to 7:30 p.m. If 
attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, 
Richemond Dorvil, can be reached on (571) 272-7602. 

Information regarding the status of an application may be obtained from the Patent 
Application Information Retrieval (PAIR) system. Inquiries regarding the status of submissions 
relating to an application or questions on the Private PAIR system should be directed to the 
Electronic Business Center (EBC) at 866-217-9197 (toll-free) or 703-305-3028 between the 
hours of 6 a.m. and midnight Monday through Friday EST, or by e-mail at: ebc@uspto.gov. For 
general information about the PAIR system, see http://pair-direct.uspto,gov. 

QH/qh 

February 17, 2007 



